Skip to content

Add fast local tool search with hybrid mode#47

Merged
vl3c merged 4 commits intomainfrom
pr-46
Feb 21, 2026
Merged

Add fast local tool search with hybrid mode#47
vl3c merged 4 commits intomainfrom
pr-46

Conversation

@vl3c
Copy link
Copy Markdown
Owner

@vl3c vl3c commented Feb 21, 2026

Summary

  • Adds a fast local keyword/category search engine (~0.01ms per query) that eliminates API calls for tool discovery in the common case
  • Default mode is hybrid: uses local search first, falls back to gpt-5-nano API only when confidence is low
  • Switches API fallback model from gpt-4.1-mini to gpt-5-nano (8x cheaper input costs)
  • Includes LRU result cache with 5-minute TTL

Search accuracy (180-case benchmark)

Metric Score
Top-1 91.7%
Top-3 100%
Top-5 100%

New tests

  • 17 mocked end-to-end prompt pipeline tests (real local search + real filtering, only OpenAI API mocked)
  • 62 creative prompt tests (casual language, homework, physics, geometry, statistics, graph theory, workspace, canvas, transforms, edge cases)
  • Offline benchmark + latency tests (p99 < 5ms)
  • 23 unit tests for local search, cache, mode switching, category registry

New tooling

  • scripts/compare_search_modes.py for side-by-side mode comparison with disagreement analysis

Configuration

Set TOOL_SEARCH_MODE env var: hybrid (default), local, or api

Test plan

  • All server tests pass (0 failures)
  • 17 prompt pipeline tests pass (streaming + non-streaming paths, filtering verification)
  • 159 tool search tests pass (benchmark + creative + unit)
  • mypy clean on static/tool_search_service.py
  • Integration smoke test: start app with hybrid mode, send chat messages, verify tools are discovered correctly

🤖 Generated with Claude Code

vl3c added 4 commits February 21, 2026 16:29
Replace the API-only tool search with a fast local keyword/category
search engine that eliminates API latency in the common case (~0.01ms
vs 1-3s per query).

Architecture:
- 13 tool categories with keyword triggers and inverted indices built
  at module load time for O(1) token lookups
- Multi-signal scoring: category boost, name/description index match,
  exact name match, action-verb alignment, and 40+ intent-based
  disambiguation rules
- LRU result cache with 5-minute TTL (100 entries max)
- Lazy OpenAI client — local mode never touches the network

Search modes (TOOL_SEARCH_MODE env var):
- hybrid (default): local first, falls back to API when confidence low
- local: keyword-only, no API call
- api: original GPT-based semantic search

Also switches API fallback model from gpt-4.1-mini to gpt-5-nano
(8x cheaper input costs).

Accuracy on 180-case benchmark: 91.7% top-1, 100% top-3, 100% top-5.
- test_tool_search_local.py: offline benchmark (190-case dataset),
  latency tests (p99 < 5ms), and 62 creative prompt tests covering
  casual language, homework scenarios, physics/engineering, geometry
  constructions, statistics, graph theory, workspace ops, canvas ops,
  transforms, ambiguous terms, and edge cases
- test_tool_search_service.py: 23 new unit tests for local search,
  cache, mode switching, category registry, and lazy client init;
  update existing API tests for mode-aware fixtures
- test_tool_discovery_live.py: add search_ms and search_mode columns
  to CSV output for latency tracking
- scripts/compare_search_modes.py: side-by-side comparison of search
  modes with disagreement analysis and CSV export
- Reference Manual: add ToolSearchService section with architecture,
  class methods, and environment variable documentation
- README.md: add TOOL_SEARCH_MODE to configuration example
- CLAUDE.md: add TOOL_SEARCH_MODE to .env configuration section
- Project Architecture: update tool count and add tool discovery line
17 tests verify the full pipeline: natural-language prompt → real local
tool search → real filtering → correct tool calls returned to the client.
Only OpenAI API calls are mocked; _intercept_search_tools and
ToolSearchService.search_tools_local run for real with TOOL_SEARCH_MODE=local.

Streaming (14 tests): circle, triangle, derivative, solve, distribution,
descriptive stats, graph, undo, save workspace, rotate, multi-tool,
filtering of irrelevant tools, essential passthrough, no-search passthrough.

Non-streaming (3 tests): o3 reasoning model, gpt-4.1 chat completion,
chat completion with irrelevant tool filtering.
@vl3c vl3c merged commit 526680d into main Feb 21, 2026
1 check passed
@vl3c vl3c deleted the pr-46 branch February 21, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant